-
Notifications
You must be signed in to change notification settings - Fork 410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option for specifying case for identifiers #663
Add option for specifying case for identifiers #663
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good start, but needs some more work and thought.
src/formatter/ExpressionFormatter.ts
Outdated
@@ -506,4 +506,19 @@ export default class ExpressionFormatter { | |||
return node.text.toLowerCase(); | |||
} | |||
} | |||
|
|||
private showIdentifier(node: IdentifierNode): string { | |||
if (/['"\\`]/.test(node.text[0]) || node.text.startsWith(`U&`)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a problematic way to tell the quoted and unquoted identifiers apart. When new types of quoted identifiers are added, one would need to also remember to update this code, which is way too easy to forget. For example, this code will already fail with Transact-SQL [bracket-quoted]
identifiers.
The lexer already has two tokens: IDENTIFIER
and QUOTED_IDENTIFIER
. Currently the parser simply throws this info away, we could instead store into about this inside the IdentifierNode
object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree - I had some trouble identifying quoted identifiers.
Your mention of IDENTIFIER and QUOTED_IDENTIFIER from the lexer was the key!
I followed your suggestion and changed the code to store the token type in the node for identifiers - and now only nodes with token type IDENTIFIER
will be converted.
@@ -286,7 +286,7 @@ export default class ExpressionFormatter { | |||
} | |||
|
|||
private formatIdentifier(node: IdentifierNode) { | |||
this.layout.add(node.text, WS.SPACE); | |||
this.layout.add(this.showIdentifier(node), WS.SPACE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to plain identifiers we also have several identifier-like things:
- variables
- parameters
these usually behave just like identifiers and the only thing distinguishing them from normal identifiers is some prefix like @myvar
or :myparam
.
As the parser currently treats variables as identifiers, these too end up upper/lowercased. But parameters are kepts separate by the parser, so the case of these doesn't change. Should parameters also change together with identifiers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My change is only for unquoted identifiers - I think variables and parameters are outside the scope and should be handled in another PR...
|
||
import { FormatFn } from '../../src/sqlFormatter.js'; | ||
|
||
export default function supportsIdentifierCase(format: FormatFn) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are likely more tests needed here to cover:
- that all identifiers in expression like
foo.bar.baz
get uppercased, - variables
- parameters
- array names (treated differently by lexer)
- various types of identifier quoting styles
As most of these things depend on the dialect, it might be better to just add additional tests to places like supportsIdentifiers()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test for multi-part identifiers in commit 558efa3
test/options/identifierCase.ts
Outdated
`); | ||
}); | ||
|
||
it('does not uppercase identifiers inside strings', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I wouldn't really call it an identifier inside a string - it's just a string.
I personally wouldn't add a separate test for this, I'd just include some strings inside the general "converts identifiers to lowercase/uppercase" test cases. But that's really a personal preference. Doesn't matter much either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String test is now included in identifier case tests and removed as separate test.
@@ -0,0 +1,54 @@ | |||
# identifierCase | |||
|
|||
Converts identifiers to upper or lowercase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a good place to specify exactly what classifies as an identifier and what doesn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a note in commit afe5b48 - is it okay?
88ec87c
to
558efa3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is much better now.
Should definitely fix the handling of array[index]
.
Dealing with the casing of variables and parameters is I think out of scope for this PR. While I believe that this identifierCase
option should also cover these, a further work is needed in the lexer and parser to make it possible to distinguish between quoted and unquoted variants of both.
I'll also have to think how should this feature interact with another feature I'd like to have: specifying the casing of functions. Function names are also identifiers. But I'd like to be able to say functionCase: "upper", identifierCase: "lower"
.
@@ -202,7 +208,7 @@ atomic_expression -> | |||
array_subscript -> %ARRAY_IDENTIFIER _ square_brackets {% | |||
([arrayToken, _, brackets]) => ({ | |||
type: NodeType.array_subscript, | |||
array: addComments({ type: NodeType.identifier, text: arrayToken.text}, { trailing: _ }), | |||
array: addComments({ type: NodeType.identifier, tokenType: TokenType.ARRAY_IDENTIFIER, text: arrayToken.text}, { trailing: _ }), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, currently ARRAY_IDENTIFIER
tokens are treated differently from normal identifiers. This results in the following SQL:
select foo, foo[1] from tbl
being formatted as:
select
FOO,
foo[1]
from
TBL
That is, when the foo
column is used normally, it gets uppercased, but when used together with array-accessor operator it is not uppercased.
This difference between normal identifiers and array-identifiers is really just an internal quirk of SQL Formatter implementation. We should treat both as identifiers and change the case of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Array identifiers are now being converted together with ordinary identifiers - see commit 0bb7d21
|
||
Note: An identifier is a name of a SQL object. | ||
There are two types of SQL identifiers: ordinary identifiers and quoted identifiers. | ||
Only ordinary identifiers are subject to be converted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is better, but according to this description I would expect the case of variables and parameters to also be converted, because they too are names of SQL objects.
In general I see all these as different kinds of identifiers:
- schema, table and column names
- variable names
- function names
- parameter names
Variables are a particularly tricky case. Some dialects like MySQL have variables with a prefix like @foo
, and therefore we can easily distinguish them. Other dialects like PostgreSQL also have variables, but there is no special prefix, an idientifier foo
could refer to a variable, or it could refer to a table or column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think I mixed up variables and parameters... by variables, do you mean SQL variables defined by CREATE VARIABLE
(in DB2)? If yes, then I agree that they should also be considered an identifier and converted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, variables like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Should this case transformation be applied after the generic If it were applied after Consider the following input query: CREATE TABLE
actors (
ID INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
NAME VARCHAR(80) NOT NULL
); Currently, by using the configuration CREATE TABLE
actors (
id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
NAME VARCHAR(80) NOT NULL
); If the identifier transform was applied after the CREATE TABLE
actors (
id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
name VARCHAR(80) NOT NULL
); AlternativesAlternatives from #156 (comment) would be:
|
@karlhorky there is no before or after here. The formatter does not do multiple passes of applying the format. It just detects some words as keywords and some as identifiers. And then these get converted to upper/lower case if configured so. It will never be the case that a word is considered both identifier and keyword by the formatter. |
Ok understood, thanks for the clarification. |
Add an option
identifierCase
to specify what case the identifiers should be converted to - like the keyword case. Possible options arepreserver
,upper
andlower
.